53 research outputs found
Combined Group and Exclusive Sparsity for Deep Neural Networks
Department of Computer Science and EngineeringThe number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at the same time remove any redundancies among the features to fully utilize the capacity of the network. Specifically, we propose to use an exclusive sparsity regularization based on (1,2)-norm, which promotes competition for features between different weights, thus enforcing them to fit to disjoint sets of features. We further combine the exclusive sparsity with the group sparsity based on (2,1)-norm, to promote both sharing and competition for features in training of a deep neural network. We validate our method on multiple public datasets, and the results show that our method can obtain more compact and efficient networks while also improving the performance over the base networks with full weights, as opposed to existing sparsity regularizations that often obtain efficiency at the expense of prediction accuracy.ope
Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection
Self-supervised Video Representation Learning (VRL) aims to learn
transferrable representations from uncurated, unlabeled video streams that
could be utilized for diverse downstream tasks. With recent advances in Masked
Image Modeling (MIM), in which the model learns to predict randomly masked
regions in the images given only the visible patches, MIM-based VRL methods
have emerged and demonstrated their potential by significantly outperforming
previous VRL methods. However, they require an excessive amount of computations
due to the added temporal dimension. This is because existing MIM-based VRL
methods overlook spatial and temporal inequality of information density among
the patches in arriving videos by resorting to random masking strategies,
thereby wasting computations on predicting uninformative tokens/frames. To
tackle these limitations of Masked Video Modeling, we propose a new token
selection method that masks our more important tokens according to the object's
motions in an online manner, which we refer to as Motion-centric Token
Selection. Further, we present a dynamic frame selection strategy that allows
the model to focus on informative and causal frames with minimal redundancy. We
validate our method over multiple benchmark and Ego4D datasets, showing that
the pre-trained model using our proposed method significantly outperforms
state-of-the-art VRL methods on downstream tasks, such as action recognition
and object state change classification while largely reducing memory
requirements during pre-training and fine-tuning.Comment: 15 page
Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
In an ever-evolving world, the dynamic nature of knowledge presents
challenges for language models that are trained on static data, leading to
outdated encoded information. However, real-world scenarios require models not
only to acquire new knowledge but also to overwrite outdated information into
updated ones. To address this under-explored issue, we introduce the temporally
evolving question answering benchmark, EvolvingQA - a novel benchmark designed
for training and evaluating LMs on an evolving Wikipedia database, where the
construction of our benchmark is automated with our pipeline using large
language models. Our benchmark incorporates question-answering as a downstream
task to emulate real-world applications. Through EvolvingQA, we uncover that
existing continual learning baselines have difficulty in updating and
forgetting outdated knowledge. Our findings suggest that the models fail to
learn updated knowledge due to the small weight gradient. Furthermore, we
elucidate that the models struggle mostly on providing numerical or temporal
answers to questions asking for updated knowledge. Our work aims to model the
dynamic nature of real-world information, offering a robust measure for the
evolution-adaptability of language models.Comment: 14 pages, 5 figures, 5 tables; accepted at NeurIPS Syntheticdata4ML
workshop, 202
- …